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Polyampholytes (PAs) are heteropolymers with long range Coulomb interactions. Unlike polymers with short range 
forces, PA energy levels have non-vanishing correlations and are thus very different from the Random Energy Model 
(REM) . Nevertheless, if charges in the PA globule are screened as in a regular plasma, PAs freeze in REM fashion. 
Our results shed light on the potential role of Coulomb interactions in folding and evolution of proteins, which are 
weakly charged PAs, in particular making connection with the finding that sequences of charged amino acids in 
proteins are not random. 
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The freezing transition of heteropolymers, in which the 
number of thermodynamically relevant states goes from 
an exponentially large value (0(e N )) in the random glob- 
ule state, to only a few (0(1)) conformations in the frozen 
state, has attracted a great deal of interest. In addition 
to providing an interesting problem in the statistical me- 
chanics of disordered materials jl] , this system is poten- 
tially relevant to the biologically important question of 
protein folding. Most previous investigations have fo- 
cused on heteropolymers with short-range interactions. 
Recently, however, there has been renewed theoretical 
[||-|[ and experimental |||| interest in polyampholytes 
(PAs), which are heteropolymers with charged monomers 
of both signs. It has been shown that, due to screening ef- 
fects, PAs collapse to compact globules if their net charge 
is below a critical value 0. There is also some evidence 
from exact enumeration studies of short chains S that 
dense globules of neutral PAs may have a freezing transi- 
tion. However, it is unclear how long range (LR) interac- 
tions affect freezing, or whether the formalism developed 
for globular polymers with short range (SR) interactions 
remains applicable to the LR case. 

The freezing transition of SR heteropolymers is most 
commonly described by the Random Energy Model 
(REM) [0], although it is not always applicable even 
in this case p~0[ ) . As the principle underlying assump- 
tion of REM is the statistical independence of energies of 
states (polymer conformations) over disorder (sequence 
of charges along the chain), we first examine correlation 
of the energies and then discuss the resulting freezing 
transition. Our starting point is the Hamiltonian 
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where B is a constant, / labels monomers along the chain, 
and s(J) E ±1 is the charge of monomer /. The range of 
interactions is indicated through f(r), such that f(r) = 
A(r) for SR interactions, and f(r) = l/r d ~ 2 for Coulomb 



forces in d dimensional space. Finally, we only consider 
the case of maximally compact polymers, assuming that 
maximal density is maintained independently of Coulomb 
interactions, i.e. by an external box, poor solvent, or 
internal attractions, such that R ~ N 1 ^. 

The simplest characteristics of statistical dependence 
of energies is the pair correlation between two arbitrary 
conformations a and fl, given by 



(E a E f3 ) c = {E a E p ) - (E a ) (E p ) = B 2 Q al: 



(2) 



with Q afi = J2i^j f( r i ~ r j)/( r / _ r j)- In tne familiar 
case of SR interactions, Q^R = £ A(r? - r})A(if - 

r j) is just the number of bonds in common between con- 
figurations a and (3. Numerical simulations |]l0f indicate 
that in many cases the probability distribution for Q?g, 

i-e. P S r(Q) = J2 a p S (Q ~ Q%) is sharply peaked at 
small Q. This happens because one can easily "hide" 
monomers by moving them only a small distance and 
decreasing their contribution to Q SR . Large statistical 
dependence is thus achieved only for conformations that 
are closely related. The validity of REM rests on the 
statistical rarity of such closely related conformations. 
REM is valid when configurations that are statistically 
dependent can be ignored in a large N limit. 

By contrast, with long range interactions, the rel- 
evant parameter for judging statistical dependence is 
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T.ijtjM - rj| • \*i - rj|]~ (d ~ 2) . While the ge- 
ometric interpretation of Q^Jg is not as clear as Q^g, it 
measures the similarity in contributions from monomer 
pairs (/, J) in conformations a and /3 to the overall en- 
ergy. Unlike the SR case, polymeric bonds always keep 
monomers within the scale of LR interactions. Thus, for 
two conformations chosen at random, the overlap Q^smd 
may not be negligible (even if Q^and is). The following 
scaling argument provides an estimate of the width of the 
probability distribution P L r(Q) = E Q/3 *(G - 2a*)- 
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FIG. 1. Scaling of Q ran( j and Q max with N for LR and SR 
interactions (d = 3). Power law scaling of the form Q ~ iV 7 
indicates that Q^d/Qmax does not vanish in the thermody- 



namic limit, whereas Qrand /QnSx 
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FIG. 2. Probability distributions P(Q LR ) and P(Q SR ), 
obtained from 64-mers on a cubic lattice. Due to finite size 
effects, there is some residual overlap in the SR case (here 
peaked at 0.1). However, we expect that the SR residual 
overlap vanishes in the thermodynamic limit, while the LR 
overlap does not. 

First, consider the maximum overlap which occurs 
(for both LR and SR) when all elements are correlated 
(i.e. Qmax = Qaa is the correlation of a configura- 
tion with itself). To compute this, we note that for 
each of the TV monomers, there is a contribution from 
0(r d_1 ) monomers at a distance r (for compact states 
in d dimensions), resulting in <2 max ~ N J drr d ~ 1 j '(r) 2 . 
For SR interactions, this integral is dominated by con- 
tributions at a microscopic length scale (set by the in- 
teraction range) and we get Q^S X ~ N. For LR in- 
teractions, while contributions from monomers far away 
are smaller, there are more of them. For Coulomb in- 
teractions in d < 4, the integral is dominated by the 
longest distance, and for a polymer of size R, we get 



QLR x „ NR d/ R 2(d-2) „ NR i-d 

We can use similar arguments for the overlap be- 
tween two conformations chosen at random (Qrand)- ^ n 
fact, for the LR problem, Q^x an d Qrand sca l e iden- 
tically, as both cases involve 0(N 2 ) pairs of monomers 
each giving a contribution 0(l/R 2( - d ~ 2 '>), for a total of 
Qmax ~ Q^and ~ N 2 R 2( - 2 ~ d \ Moreover, as the main con- 
tribution to Qj^j comes from far away sites, this resid- 
ual overlap is only weakly conformation dependent. The 
existence of a residual overlap changes the problem fun- 
damentally from the SR case: REM is not valid as there 
is always a statistical dependence in d < 4 Q . 

Computer simulations support the above arguments. 
To examine a large range in N, we generated random 
conformations on a lattice by first choosing a radius R, 
and then enumerating random paths [fl2[ on the set of 
lattice sites which are within R. R was varied from 3 
to 10 lattice sites, and the following results represent av- 
erages over 20 conformations for each R value. Fig. 1 
shows that the scaling exponents 7 defined by Q ~ TV 7 
appear to be the same within error for random pairs of 
conformations, as well as the overlap of any conforma- 
tion with itself. Furthermore, the fits agree well with 
the predictions 7 m a X = 7 ra „d =4/3. By contrast, with 
SR interactions 7 ro ^x = 1, while ~ 0.75 is distinctly 
smaller. We also calculated SR and LR overlaps Q and 
qLR £ Qr 2000 pairs of 64-mer conformations (d — 3, cubic 
lattice). The resulting histograms, with overlaps normal- 
ized by the maximal value, are shown in Fig. 2. SR over- 
laps are peaked at small values whereas the LR overlaps 
are peaked closer to unity. Furthermore, the sharpness 
of the distribution suggests that Q LR is approximately 
independent of the chosen pairs of conformations. 

Having demonstrated the residual overlap between en- 
ergies of conformations with LR interactions, and hence 
the breakdown of REM, we go on to better characterize 
the density of states. This will take us a step closer to un- 
derstanding the freezing of PAs. To describe the density 
of states, we use the following three characteristics: the 
annealed energy variance u ann (the width of the density 
of states for annealed disorder) , the average quenched en- 
ergy variance cr quon (the width of the density of states for 
quenched disorder) , and the quenched energy correlation 
function g (the statistical dependence between states). 
These quantities are given by the formulae 
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where ttt and (. . .) denote averaging over conformations 
and sequences respectively. Note that these quantities 
are related by a mathematical identity a 
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FIG. 3. Mean and width of the energy spectra for 80 
sequences of 36-mers, determined by full enumeration over all 
maximally compact conformations (see text for details). 
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In the annealed case, the energy variance is <r a 
B 2 Q maK , since, in this case, all possible states can be ac- 
cessed and thus the width of the energy spectrum must be 
maximal. This result is also easily extracted from equa- 
tion (||) by averaging over conformations with a = (3. 
Averaging the same equation over all pairs of states a 
and (3, we can find g: for M conformations, there are M 
pairs a = (3 which completely overlap Q a p — Q max , but 
this is overshadowed by the remaining Ai(A4 — 1) pairs 
with overlap Q af} = Q ran d, resulting in g w S 2 Q rand . In 
addition to measuring the statistical dependence between 
states, g — ((i?) 2 ) c also describes how the mean of the 
energy spectrum for a given sequence varies between se- 
quences. Finally the width of the energy spectrum for a 



typical sequence is a 1 
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This makes sense physically as correlation (anticorrela- 
tion) in the energies should narrow (broaden) the width 
of the energy spectra. Also, we see that when there is no 
correlation {g — 0), (7 ann = Cqucni as in the REM. 

The following picture emerges from the above results. 
As Q^ d = 0, we have g = for the SR case above 
the freezing temperature, and the mean of the energy 
spectrum does not vary significantly between sequences. 
Also, the width of the spectrum for a given sequence is 
large (the maximum possible value, as in the annealed 
case) . The variation of the means of the energy spectra 
between sequences g, is much smaller than the typical 
width of each spectrum Cq Ucn ; thus disorder is not impor- 
tant for SR interactions above freezing. Of course, be- 
low the freezing temperature, self averaging breaks down, 
and disorder is relevant. By contrast, for LR interactions, 
<2rand docs not vanish and is significant. We thus expect 
the widths of the energy spectra to be small and the 
means to vary widely from sequence to sequence. 

The results of a computational test of the above sce- 



nario, obtained from the exact enumeration of all globu- 
lar states of 36-mers on a cubic lattice (d = 3) are pre- 
sented in Fig. 3. We see that for SR interactions, the 
means of the spectra are indeed well defined and their 
width (gray region) is large. For LR interactions, the 
means are poorly defined, with a variance between se- 
quences which is greater than the widths of individual 
spectra (error bars). 

Is the insight gained above sufficient to analyze the 
freezing transition in PAs? In general, freezing is gov- 
erned by the low energy tail of the density of states 
p(E) = MP(E), where M. is the total number of confor- 
mations, and P(E) is the single level energy distribution. 
In the standard REM entropy crisis scenario, the system 
freezes in a microstate, much like a snapshot, at a tem- 
perature Tf at which px ~ 1, where pr = p(Et) is the 
density of states at the equilibrium energy Et at the 
temperature T. 

The density of states in the high temperature regime is 
governed by cr ann , as can be seen by a high temperature 
expansion: The partition function Z = tr [exp (— /37i)] 
is first expanded in powers of (3 = 1/T, resulting 
in (after averaging over sequences) —f3F = (In Z) = 

InM-P (E) + /3 2 ({E 2 )) c /2 H . From this expression 

(and using Eq.(||)), the entropy is calculated as S(T) — 

\nM — /3 2 <7q Uon /2 -| , where (as demonstrated earlier) 

for Coulomb interactions in d = 3, cTq Ucn ~ e 2 N 2 /R, 
yielding 



p T ~ M exp 



TR 
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/From the structure of the series ||, we expect the 
high temperature expansion to break down for temper- 
atures T < Td = e 2 N/R. This temperature can also 
be obtained by regarding the polymer globule as a (non- 
polymeric) plasma of the same N charges confined within 
the volume R 3 . As the Debye screening length for this 
plasma is of the order rry ~ (TR 3 /Ne 2 ) 1 / 2 , there are two 
regimes: For T < To, the plasma is fully screened as 
Try < R. However, for T > Td, ro > R and the charges 
are not screened. The latter regime is meaningless for a 
regular plasma, but describes the high temperature be- 
havior of the polymer globule. It is not clear that, with 
the constraints of polymeric bonds, the scaling for a PA 
should be the same as that for a screened plasma at low 
temperatures. However, assuming that this is the case, 
the entropy can be estimated by noting that the plasma 
is composed of roughly Af ~ R 3 /r D ~ (Ne 2 / RT) 3 / 2 in- 
dependent Debye volumes. Assuming that the entropy is 
proportional to A/", we finally conclude 

3/2" 

/ e iv ' 

-C [ 



Pt ~ M. exp 



TR 
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where c is a numerical constant. Note that Eq. (|5j) in- 
dicates a very sharp decrease of the density of states in 
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the low energy tail, proportional to exp[— c'(E — E) 3 }, 
which reflects the fine tuning of configurations necessary 
for screening. 

Typically the number of conformations of a polymer 
scales as M. ~ e ulN , with u> of the order of unity. In 
the limit where the polymer is kept maximally compact 
by an external box, poor solvent, or internal attractions, 
such that R ~ aY 1 / 3 , where a is a monomeric length 
scale, u) is approximately the entropy of Hamiltonian 
walks. Freezing, which is signaled by p ~ 1, can take 
place in the unscreened regime only for short chains with 
N < l/u>. (The "apparent" freezing temperature for un- 
screened polymers grows as TV 1 / 6 .) In this fur- 
ther decrease of temperature will not lead to screening, 
of course. For longer chains, we predict freezing at an 
TV- independent temperature of Tf ~ e 2 /(auS 2 / 3 ) in the 
screened regime. In this sense, the compact PA freezes in 
a phase transition that is similar to REM. We stress that 
this happens despite the unusual scaling of the width of 
the density of states, tr ~ TV 2 / 3 . The distinction between 
the two behaviors is important for understanding the re- 
sults of lattice simulations, as it appears that 36-mers are 
in the short chain regime. 

We expect that the nature of the frozen state also de- 
pends on Tf/Tjj. For freezing in the screened regime 
(Tf < To), the system looks much like that of the SR 
case, i.e. like a disordered version of a salt crystal. For 
freezing in the unscreened regime (Tf > To), we expect a 
smaller degree of antiferrogamnetic ordering; consistent 
with the idea that freezing at a higher temperature leads 
to a state which is less energetically optimized. 

An important class of PAs are proteins. In the light of 
our findings in this work, we make here some concluding 
remarks about protein folding and evolution. Of the 20 
natural amino-acids, three are positively charged (Lys, 
Arg, His), two are negatively charged (Asp, Glu), and 
the rest are neutral. Nevertheless, it is often assumed 
that LR interactions are not essential to proteins, as the 
screening length in biological solvents is often quite small. 
It is less clear that screening is also effective in compact 
globular configurations with little or no solvent in their 
interiors. Furthermore, secondary structural elements 
such as a-helices effectively reduce the conformational 
flexibility of proteins. Indeed, the conformation space of 
small proteins (i.e. 70-90 amino-acids) perhaps corre- 
sponds to that of lattice 27-mers [jl3| , and small proteins 
are likely to be in the short chain regime with respect to 
LR interactions. Thus, while the total charge on a given 
protein may be small, in solvents with few counter ions, 
this may be sufficient to lead to a REM-violating corre- 
lated energy landscape, making the results obtained here 
relevant. Moreover, for the typical separation of charges 
in a globular protein (roughly 20 A), and given a dielec- 
tric constant of order 5-10, and u> « 2, the characteristic 
freezing temperature Tf is of the order of (biologically 
relevant) room temperatures. 



We have discussed how the mean of the density of 
states can vary greatly from sequence to sequence. It 
appears that a large contribution to this mean comes 
from the interaction between monomers that are not far 
apart along the sequence. For example, while next near- 
est neighbors along the chain can somewhat vary their 
spatial distance from each other, this will still not break 
their great contribution to the mean energy. This is 
why the conformational average energy depends strongly 
on the correlations between charges quenched along the 
sequence. For Coulomb interactions, chains with anti- 
correlated sequences have low mean energies. This is 
intriguing, considering the recent finding that protein se- 
quences are indeed anti-correlated with respect to their 
charge . This indicates that perhaps protein evolution 
was not just dictated solely by the degree of hydrophobic- 
ity of monomers (which depends on the degree of charge, 
not the sign), but by Coulomb effects as well. 
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